UW-ISL Document Image Analysis Toolbox: An Experimental Environment

نویسندگان

  • Jisheng Liang
  • Richard Rogers
  • Robert M. Haralick
  • Ihsin T. Phillips
چکیده

A document image analysis toolbox, including a collection of data structures and algorithms to suppbrt a variety of applications, is described in this paper. An experimental environment is built to allow developers to develop, test and optimize their algorithms and systems. Appropriate and quantitative performance metrics for each kind of information a document analysis technique infers have been developed, The performance of each algorithm has been evaluatd based o n these metrics and the UW-III document image database which contains a total of 1600 English document images randomly selected from scientific and technical journals. 1 Introduction The goal of document image analysis is to transform document images into a hierarchical representation of their structure and content. The document image analysis techniques have to be proved out on significant sized data sets and there must be suitable performance metrics for each kind of information a document understanding technique infers. In the Intelligent Systems Laboratory (ISL) at the University of Washington, we are developing a document image analysis toolbox, including a collection of data structures and algorithms to support a variety of applications. An experimental environment has been built to allow developers to develop, evaluate and optimize their algorithms. The appropriate and quantitative performance metrics for each kind of information a document analysis technique infers have been developed. The architecture allows for convenient experimentation to evaluate the performance of different algorithms and sequences of modules. The performance of each algorithm and the whole system can be evaluated based on these metrics and significant sized test data sets. A series of document image databases have been created for this purpose. We have constructed a prototype of the system and demonstrated its flexibility and functionality on different applications. The document structure consists of layout struc-Analysis Problem ture, logical structure, style and content. Layout Structure A layout structure of a document image is a specification of the geometry of the polygons, the content types of the polygons, and the spatial relations of these polygons. Formally, a layout structure is Q = (A, D), where A is a set of homogeneous polygonal areas, and 2, is a set of dividers. Logical Structure Logical Structure extraction involves assigning functional labels to each polygon of the page, and ordering the text polygons according to their read order. Formally, a logical structure is Q = (M , R), where M associates polygonal areas with their types of content, and …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UW-ISL Document Image Analysis Toolbox : An Environment

A document image analysis toolbox, including a collection of data structures and algorithms to suppbrt a variety of applications, is described in this paper. An experimental environment is built to allow developers to develop, test and optimize their algorithtis and systems. Appropriate and quantitative performance metrics for each kind of information a document analysis technique infers have b...

متن کامل

UW - ISL Document Image Analysis Toolbox : An ExperimentalEnvironmentJ

A document image analysis toolbox, including a collection of data structures and algorithms to support a variety of applications, is described in this paper. An experimental environment is built to allow developers to develop, test and optimize their algorithms and systems. Appropriate and quantitative performance metrics for each kind of information a document analysis technique infers have be...

متن کامل

The ISL Document Image Analysis Toolbox

This paper describes the Document Image Understanding Toolbox currently under development at the University of Washington’s Intelligent Systems Laboratory The Toolbox provides a common data structure and a variety of document image analysis and understanding algorithms from which Toolbox users can construct document image processing systems. An algon’thms for font attribute recognition based on...

متن کامل

Improved hypothermic short-term storage of isolated mouse islets by adding serum to preservation solutions

Preserving isolated islets at low temperature appears attractive because it can keep islet quantity comparable to freshly isolated islets. In this study, we evaluated the effect of serum as an additive to preservation solutions on islet quality after short-term hypothermic storage. Isolated mouse islets were preserved at 4°C in University of Wisconsin solution (UW) alone, UW with serum, M-Kyoto...

متن کامل

A Vision-Based Approach for Indian Sign Language Recognition

The sign language is the essential communication method between the deaf and dumb people. In this paper, the authors present a vision based approach which efficiently recognize the signs of Indian Sign Language (ISL) and translate the accurate meaning of those recognized signs. A new feature vector is computed by fusing Hu invariant moment and structural shape descriptor to recognize sign. A mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997